Goto

Collaborating Authors

 back-propagation network


Comparing Biases for Minimal Network Construction with Back-Propagation

Neural Information Processing Systems

This approach can be used to (a) dynamically select the number of hidden units. The method Rumelhart suggests involves adding penalty terms to the usual error function. In this paper we introduce Rumelhart·s minimal networks idea and compare two possible biases on the weight search space. These biases are compared in both simple counting problems and a speech recognition problem.


Handwritten Digit Recognition with a Back-Propagation Network

Neural Information Processing Systems

We present an application of back-propagation networks to hand(cid:173) written digit recognition. Minimal preprocessing of the data was required, but architecture of the network was highly constrained and specifically designed for the task. The input of the network consists of normalized images of isolated digits. The method has 1 % error rate and about a 9% reject rate on zipcode digits provided by the U.S. Postal Service.


A practical Bayesian framework for back-propagation networks

MacKay, D. J. C.

Classics

A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a model, (5) quantified estimates of the error bars on network parameters and on network output, and (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian "evidence" automatically embodies "Occam's razor," penalizing overflexible and overcomplex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalization ability and the Bayesian evidence is obtained.


Generalization Dynamics in LMS Trained Linear Networks

Chauvin, Yves

Neural Information Processing Systems

Recent progress in network design demonstrates that nonlinear feedforward neural networks can perform impressive pattern classification for a variety of real-world applications (e.g., Le Cun et al., 1990; Waibel et al., 1989). Various simulations and relationships between the neural network and machine learning theoretical literatures also suggest that too large a number of free parameters ("weight overfitting") could substantially reduce generalization performance.


Relaxation Networks for Large Supervised Learning Problems

Alspector, Joshua, Allen, Robert B., Jayakumar, Anthony, Zeppenfeld, Torsten, Meir, Ronny

Neural Information Processing Systems

Feedback connections are required so that the teacher signal on the output neurons can modify weights during supervised learning. Relaxation methods are needed for learning static patterns with full-time feedback connections. Feedback network learning techniques have not achieved wide popularity because of the still greater computational efficiency of back-propagation. We show by simulation that relaxation networks of the kind we are implementing in VLSI are capable of learning large problems just like back-propagation networks. A microchip incorporates deterministic mean-field theory learning as well as stochastic Boltzmann learning. A multiple-chip electronic system implementing these networks will make high-speed parallel learning in them feasible in the future.


Generalization Dynamics in LMS Trained Linear Networks

Chauvin, Yves

Neural Information Processing Systems

Recent progress in network design demonstrates that nonlinear feedforward neural networks can perform impressive pattern classification for a variety of real-world applications (e.g., Le Cun et al., 1990; Waibel et al., 1989). Various simulations and relationships between the neural network and machine learning theoretical literatures also suggest that too large a number of free parameters ("weight overfitting") could substantially reduce generalization performance.


Relaxation Networks for Large Supervised Learning Problems

Alspector, Joshua, Allen, Robert B., Jayakumar, Anthony, Zeppenfeld, Torsten, Meir, Ronny

Neural Information Processing Systems

Feedback connections are required so that the teacher signal on the output neurons can modify weights during supervised learning. Relaxation methods are needed for learning static patterns with full-time feedback connections. Feedback network learning techniques have not achieved wide popularity because of the still greater computational efficiency of back-propagation. We show by simulation that relaxation networks of the kind we are implementing in VLSI are capable of learning large problems just like back-propagation networks. A microchip incorporates deterministic mean-field theory learning as well as stochastic Boltzmann learning. A multiple-chip electronic system implementing these networks will make high-speed parallel learning in them feasible in the future.


Relaxation Networks for Large Supervised Learning Problems

Alspector, Joshua, Allen, Robert B., Jayakumar, Anthony, Zeppenfeld, Torsten, Meir, Ronny

Neural Information Processing Systems

Feedback connections are required so that the teacher signal on the output neurons can modify weights during supervised learning. Relaxation methods are needed for learning static patterns with full-time feedback connections. Feedback network learning techniques have not achieved wide popularity because of the still greater computational efficiency of back-propagation. We show by simulation that relaxation networks of the kind we are implementing in VLSI are capable of learning large problems just like back-propagation networks. A microchip incorporates deterministic mean-field theory learning as well as stochastic Boltzmann learning. A multiple-chip electronic system implementing these networks will make high-speed parallel learning in them feasible in the future.


Generalization Dynamics in LMS Trained Linear Networks

Chauvin, Yves

Neural Information Processing Systems

Recent progress in network design demonstrates that nonlinear feedforward neural networkscan perform impressive pattern classification for a variety of real-world applications (e.g., Le Cun et al., 1990; Waibel et al., 1989). Various simulations and relationships between the neural network and machine learning theoretical literatures alsosuggest that too large a number of free parameters ("weight overfitting") could substantially reduce generalization performance.


Dynamic Behavior of Constained Back-Propagation Networks

Chauvin, Yves

Neural Information Processing Systems

It is generally admitted that generalization performance of back-propagation networks (Rumelhart, Hinton & Williams, 1986) will depend on the relative size ofthe training data and of the trained network. By analogy to curve-fitting and for theoretical considerations, the generalization performance of the network should decrease as the size of the network and the associated number of degrees of freedom increase (Rumelhart, 1987; Denker et al., 1987; Hanson & Pratt, 1989). This paper examines the dynamics of the standard back-propagation algorithm (BP) and of a constrained back-propagation variation (CBP), designed to adapt the size of the network to the training data base. The performance, learning dynamics and the representations resulting from the two algorithms are compared.